Machine Learning - Random Search

Table of Contents

This article explains the concept of Random Search, one of the hyperparameter tuning methods to maximize the performance of models in machine learning, and introduce an example of implementation using the Scikit-learn library.

What is Random Search? #

Random Search is a method that evaluates randomly selected combinations within a given parameter space to find the optimal combination of hyperparameters. Unlike Grid Search, which systematically explores all combinations of specified parameters, Random Search evaluates combinations randomly selected from the search space. This method is particularly useful when the dimension of hyperparameters is high or the search space is large, often yielding similar or better results in less time.

Key Parameters #

Random Search can be implemented through the RandomizedSearchCV class in Scikit-learn. The key parameters are as follows:

estimator: Specifies the model to optimize. For example, RandomForestClassifier(), SVC(), etc.
param_distributions: Specifies the parameter space to explore. You can specify a continuous distribution for each parameter or provide a list.
n_iter: Specifies the number of parameter settings to be randomly selected. The larger this value, the more combinations will be explored, but computation time will also increase.
scoring: Specifies the criterion for evaluating the model’s performance. For example, ‘accuracy’, ‘f1’, etc.
cv: Specifies the strategy for cross-validation splitting. Entering an integer value will perform k-fold cross-validation with that value.
random_state: Specifies the seed value for the random number generator to ensure reproducibility of the results.

Implementing RandomizedSearchCV #

Below is an example of using RandomizedSearchCV to find the optimal hyperparameters for a classifier. In the code below, we use a Support Vector Machine (SVM).

>>> from sklearn.model_selection import RandomizedSearchCV
>>> from sklearn.svm import SVC
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> import scipy.stats as stats

# Load the dataset
>>> iris = load_iris()
>>> X, y = iris.data, iris.target

# Split the dataset into training and testing sets
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate the Support Vector Machine
>>> svc = SVC()

# Define the hyperparameter space to explore
>>> param_distributions = {
    	'C': stats.uniform(0.1, 1000),
    	'gamma': stats.uniform(0.0001, 0.1),
    	'kernel': ['linear', 'rbf']
	}

# Instantiate RandomizedSearchCV
>>> random_search.fit(X_train, y_train)

# Output the best parameters and highest accuracy
>>> print("Best parameters:", random_search.best_params_)
>>> print("Best cross-validation score: {:.2f}".format(random_search.best_score_))

# Evaluate performance on the test set
>>> accuracy = random_search.score(X_test, y_test)
>>> print("Test set accuracy: {:.2f}".format(accuracy))

This code searches for the optimal combination by exploring random combinations of the specified C, gamma, kernel hyperparameters for the SVC model. RandomizedSearchCV can be a more effective method than grid search, especially when the search space is wide.